Robust speaker identification based on perceptual log area ratio and Gaussian mixture models
نویسندگان
چکیده
This paper presents a new feature for speaker identification called perceptual log area ratio (PLAR). PLAR is closely related to the log area ratio (LAR) feature. PLAR is derived from the perceptual linear prediction (PLP) rather than the linear predictive coding (LPC). The PLAR feature derived from PLP is more robust to noise than the LAR feature. In this paper, PLAR, LAR and MFCC features were tested in a Gaussian mixture model (GMM) based speaker identification system. The F-ratio feature analysis showed that the lower order PLAR and LAR coefficients are superior in classification performance to their MFCC counterparts. The text-independent, closed-set speaker identification accuracies, as tested on KING, YOHO and the down-sampled version of TIMIT databases were 85.29%, 97.045%, 98.81%, using PLAR, 61.76%, 94.76%, 97.92%, using LAR and 84.31%, 96.48%, 96.73%, using MFCC. Those results showed that PLAR is better than LAR and MFCC in both clean and noisy environments.
منابع مشابه
Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech: Speech Understanding / Interaction
Log area ratio coefficients (LAR) derived from linear prediction coefficients (LPC) is a well known feature extraction technique used in speech applications. This paper presents a novel way to use the LAR feature in a speaker identification system. Here, instead of using the mel frequency cepstral coefficients (MFCC), the LAR feature is used in a Gaussian mixture model (GMM) based speaker ident...
متن کاملSpeaker Identification Using Gaussian Mixture Models
In this paper, the performance of Perceptual Linear Prediction (PLP) features has been compared with the performance of Linear Prediction Coefficient (LPC) features for speaker identification. Two classification techniques, Gaussian Mixture Models (GMM) and Vector Quantization (VQ) with Dynamic time wrapping (DTW) are used for classification of speakers based on their speech samples into respec...
متن کاملRobust text-independent speaker identification using Gaussian mixture speaker models
This paper introduces and motivates the use of Gaussian mixture models (CMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are efTective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance ...
متن کاملText-Independent Speaker Recognition Using Gaussian Mixture Models Final Term Paper Proposal
The proposed project is an implementation of speaker recognition systems, both identification and verification. The systems are built using Gaussian Mixture Models, as proposed in several papers from Douglas A. Reynolds. The use of Fractional Covariance Matrix is studied as an possible increase for the traditional recognition systems. keywords: speaker recognition; Gaussian Mixture Models; like...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کامل